SEPP: SATe -Enabled Phylogenetic Placement

نویسندگان

  • Siavash Mirarab
  • Nam Nguyen
  • Tandy J. Warnow
چکیده

We address the problem of Phylogenetic Placement, in which the objective is to insert short molecular sequences (called query sequences) into an existing phylogenetic tree and alignment on full-length sequences for the same gene. Phylogenetic placement has the potential to provide information beyond pure "species identification" (i.e., the association of metagenomic reads to existing species), because it can also give information about the evolutionary relationships between these query sequences and to known species. Approaches for phylogenetic placement have been developed that operate in two steps: first, an alignment is estimated for each query sequence to the alignment of the full-length sequences, and then that alignment is used to find the optimal location in the phylogenetic tree for the query sequence. Recent methods of this type include HMMALIGN+EPA, HMMALIGN+pplacer, and PaPaRa+EPA.We report on a study evaluating phylogenetic placement methods on biological and simulated data. This study shows that these methods have extremely good accuracy and computational tractability under conditions where the input contains a highly accurate alignment and tree for the full-length sequences, and the set of full-length sequences is sufficiently small and not too evolutionarily diverse; however, we also show that under other conditions accuracy declines and the computational requirements for memory and time exceed acceptable limits. We present SEPP, a general "boosting" technique to improve the accuracy and/or speed of phylogenetic placement techniques. The key algorithmic aspect of this booster is a dataset decomposition technique in SATé, a method that utilizes an iterative divide-and-conquer technique to co-estimate alignments and trees on large molecular sequence datasets. We show that SATé-boosting improves HMMALIGN+pplacer, placing short sequences more accurately when the set of input sequences has a large evolutionary diameter and produces placements of comparable accuracy in a fraction of the time for easier cases. SEPP software and the datasets used in this study are all available for free at http://www.cs.utexas.edu/users/phylo/software/sepp/submission.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TIPP: taxonomic identification and phylogenetic profiling

MOTIVATION Abundance profiling (also called 'phylogenetic profiling') is a crucial step in understanding the diversity of a metagenomic sample, and one of the basic techniques used for this is taxonomic identification of the metagenomic reads. RESULTS We present taxon identification and phylogenetic profiling (TIPP), a new marker-based taxon identification and abundance profiling method. TIPP...

متن کامل

Hepatic selenoprotein P (SePP) expression restores selenium transport and prevents infertility and motor-incoordination in Sepp-knockout mice.

SePP (selenoprotein P) is central for selenium transport and distribution. Targeted inactivation of the Sepp gene in mice leads to reduced selenium content in plasma, kidney, testis and brain. Accordingly, activities of selenoenzymes are reduced in Sepp(-/-) organs. Male Sepp(-/-) mice are infertile. Unlike selenium deficiency, Sepp deficiency leads to neurological impairment with ataxia and se...

متن کامل

Phylogenetic relationships of anamorphic form of some Pleosporalean genera based on analysis of ITS rDNA and RPB2

Pleosporaceae is an important Dothideomycetes family. To elucidate relationships among some selected anamorphic pleosporalean taxa, their Internal Transcribed Spacer (ITS) and RNA polymerase second largest subunit (RPB2) were sequenced and compared. Phylogenetic analyses of both ITS and RPB2 regions were almost similar and generally congruent with previously described phylogenies and morphology...

متن کامل

Report on the Third Static Analysis Tool Exposition ( SATE 2010 ) Editors : Vadim Okun

The NIST Software Assurance Metrics And Tool Evaluation (SAMATE) project conducted the third Static Analysis Tool Exposition (SATE) in 2010 to advance research in static analysis tools that find security defects in source code. The main goals of SATE were to enable empirical research based on large test sets, encourage improvements to tools, and promote broader and more rapid adoption of tools ...

متن کامل

Deletion of selenoprotein P alters distribution of selenium in the mouse.

Selenoprotein P (Se-P) contains most of the selenium in plasma. Its function is not known. Mice with the Se-P gene deleted (Sepp(-/-)) were generated. Two phenotypes were observed: 1) Sepp(-/-) mice lost weight and developed poor motor coordination when fed diets with selenium below 0.1 mg/kg, and 2) male Sepp(-/-) mice had sharply reduced fertility. Weanling male Sepp(+/+), Sepp(+/-), and Sepp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2012